Convolutional Neural Networks basics

and an example with classification

Convolutional Neural Networks (CNN, ConvNet)

  • a class of artificial Neural Networks
  • widely used in computer vision*
  • learn hierarchy of features

*Computer vision is an essential, complex, wide-spread, and ever developing part of AI. Computer vision tasks include:

  • Image Classification
  • Classification with localization
  • Object detection (verify the presence of specific objects in an image)
  • Image captioning
  • action classification
  • Semantic segmentation
  • Instance segmentation
  • Neural Style Transfer

Why CNNs?

  • Need not to be restricted to small input images.

  • Image of 1300 X 1300px --> Input on a layer of a FC (Dense) network with e.g. 1000 hidden units $\rightarrow$ parameters (weights) to calculate : (1000,1300x1300) = 1.69E9 parameters (x3 for a color image) $\rightarrow$ Computationally expensive and data hungry.

Convolution Operation

Example with edges filters

(edge detectors)

In [26]:
image = PIL.Image.open('brick2.jpeg').convert('L')
brick = asarray(image)
plt.figure(figsize=(20,18))
plt.imshow(brick,cmap='gray')
plt.axis('off')
plt.show()
In [27]:
image = plt.imread('filters_conv.jpeg')
plt.figure(figsize = (20,20))
plt.imshow(image)
plt.axis('off')
plt.show()
In [8]:
out_conv_h=signal.convolve2d(brick,filter_h,mode='valid')
out_conv_v=signal.convolve2d(brick,filter_v,mode='valid')
f, axarr = plt.subplots(1,2,figsize = (50,50))
axarr[0].imshow(np.absolute(out_conv_h),cmap='gray')
axarr[0].set_axis_off()
axarr[1].imshow(np.absolute(out_conv_v),cmap='gray')
axarr[1].set_axis_off()

plt.show()
In [10]:
image = PIL.Image.open('brick.jpeg')
brick = asarray(image)
plt.figure(figsize=(10,10))
plt.imshow(brick)
plt.axis('off')
plt.show()
In [11]:
out_corr_v=signal.correlate(brick,filter_v_3d,mode='valid')
out_corr_h=signal.correlate(brick,filter_h_3d,mode='valid')

f, axarr = plt.subplots(1,2,figsize = (50,50))
axarr[0].imshow(np.absolute(out_corr_h),cmap='seismic')
axarr[0].set_axis_off()
axarr[1].imshow(np.absolute(out_corr_v),cmap='seismic')
axarr[1].set_axis_off()
plt.show()

Padding

Strided Convolutions

Convolutions on volumes (RGB images)

Layers in ConvNets

The building blocks (i.e. most commonly used types of layers in CNNs)

  • Convolutional layers
  • Pooling layers
    • Max Pooling
    • Average Pooling
    • Global Pooling
  • Dense (Fully Connected) layers

Schematic Description of a ConvNet

CNNs in Tensorflow

  • Conv Layer (2D)

  • Conv Block

  • Simple Conv Net

  • Conv2D(#filters, (fx, fy), activation='relu',input_shape=(width,height,channels))
    • Conv2D(32,(3,3),activation='relu',input_size=(240,240,3)) [if this is the input layer]
    • Conv2D(128,(5,5),activation='relu')
  • MaxPooling2D(pool_size=(2, 2),strides=1)
  • Flatten() [no arguments]
  • Dense(#output_units,activation)
    • Dense(512,activation='relu')
    • Dense(#N, activation='softmax') [if last layer of N-class classification]
    • Dense(2,activation='sigmoid') [if last layer of binary classification]

Hands-on

  • Simple classification example using a ConvNet

Classification/Regression

  • Regression:
    • output is a prediction on a quantity which takes continuous values
  • Classification:
    • output is a prediction on a class, i.e. discrete labels
      • binary classification
      • multi-class classification
        • single-label classification
        • multi-label classification

steps

  1. Data
    • set of 8 different classes of images: airplane, car, cat, dog, flower, fruit, motorbike, person
    • create the relevant datasets
  2. CNN model
    • build the CNN model, train it with the given data, check performance
    • we will observe overfitting
  3. Improve previous CNN model: i.e. avoid overfitting
    • Dropout (interfere in the model architecture)
    • Augmentation (interfere with the data)
  4. Transfer learning
    • Do not use own CNN model, but borrow a pretrained one. Use it on our given data.

Dropout

Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JMLR (2014) http://jmlr.org/papers/v15/srivastava14a.html

Augmentation

Augmenting the training dataset, by applying transformations to existing data (images in this case): flip, rotate, shear, shift, zoom, color distortion, etc.

Transfer Learning